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INTRODUCTION 

Lung  cancer  continues  to  be  the  leading  cause  of  cancer-related  death  in  both  men  and  women 
in  the  United  States  \  The  majority  of  lung  cancers  are  non-small  cell  lung  cancers  (NSCLCs) 
that  include  squamous  cell  carcinomas  (SCCs)  and  adenocarcinomas 2  Lung  cancer  mortality 
is  high  in  part  because  most  cancers  are  diagnosed  after  regional  or  distant  spread  of  the 
disease  had  already  occurred  and  due  to  the  lack  of  reliable  biomarkers  for  early  detection  and 
risk  assessment 2.  The  identification  of  new  effective  early  biomarkers  will  improve  clinical 
management  of  lung  cancer  and  is  linked  to  better  understanding  of  the  molecular  events 
associated  with  the  development  and  progression  of  the  disease. 

It  has  been  suggested  that  histologically  normal-appearing  tissue  adjacent  to  neoplastic  lesions 
display  molecular  abnormalities  some  of  which  are  in  common  with  those  in  the  tumors  3.  This 
phenomenon,  termed  field  of  cancerization,  was  later  shown  to  be  evident  in  various  epithelial 
cell  malignancies,  including  lung  cancer 4,s.  Loss  of  heterozygosity  (LOH)  events  are  frequent  in 
cells  obtained  from  bronchial  brushings  of  normal  and  abnormal  lungs  from  patients  undergoing 
diagnostic  bronchoscopy  and  were  detected  in  cells  from  the  ipsilateral  and  contralateral  lungs 
6.  More  recently,  global  mRNA  expression  profiles  have  been  described  in  the  normal¬ 
appearing  bronchial  epithelium  of  healthy  smokers  7.  In  addition,  modulation  of  global  gene 
expression  in  the  normal  epithelium  in  health  smokers  is  similar  in  the  large  and  small  airways 
and  the  smoking-induced  alterations  are  mirrored  in  the  epithelia  of  the  mainstem  bronchus, 
buccal  and  nasal  cavities  8.  Finally,  our  group  has  previously  shown  that  gene-expression 
profiles  in  cytologically  normal  mainstem  bronchus  epithelium  can  distinguish  smokers  with  and 
without  lung  cancer  and  can  serve  as  an  early  diagnostic  biomarker  for  lung  cancer9. 

In  this  program,  in  Specific  Aim  1 ,  we  will  extend  our  work  in  this  field  by  spatially  mapping  the 
molecular  field  of  injury  associated  with  smoking-related  lung  cancer.  In  smokers  undergoing 
resection  of  lung  lesions,  high-throughput  mRNA  expression  analyses  are  being  performed  on 
cytological  specimens  (brushings)  obtained  at  intraoperative  bronchoscopy  from  the  nasal 
epithelium,  main  carina  and  ipsilateral  and  contralateral  proximal  and  distal  bronchi  (relative  to 
the  location  of  the  resected  lung  lesion),  as  well  as  on  specimens  obtained  at  lobectomy  from 
sub-segmental  bronchus  (adjacent  to  tumor)  and  from  the  resected  NSCLC  tumors.  Towards 
this  aim,  we  are  comparing  and  contrasting  global  gene  expression  patterns  across  all  the 
specimens  from  the  entire  field  and  corresponding  NSCLC  tumors.  We  are  currently  performing 
RNA-sequencing  and  microarray  profiling  of  nasal  epithelia,  airway  epithelial  cells  collected  from 
both  bronchoscopy  and  lobectomy  specimens  as  well  as  of  corresponding  tumors  (NSCLC 
patients)  or  benign  lesions  (cancer-free  individuals). 


In  Specific  Aim  2,  we  are  using  laser  capture  microdissection  to  obtain  specific  cell  populations 
(basal  cells  or  type  II  alveolar  cells,  depending  on  the  NSCLC  histology/location)  as  well  as 
premalignant  lesions  and  epithelial  components  of  the  tumors.  These  cell  populations  are  being 
profiled  with  RNA-seq  to  determine  their  gene  expression  signatures  to  increase  our 
understanding  of  premalignancy.  We  are  analyzing  the  gene  expression  profiles  that  are 
associated  with  progression  from  a  benign  cell  population  to  premalignancy  and  with 
progression  from  a  benign  cell  population  to  true  malignancy. 

In  future  studies,  in  Specific  Aim  3,  we  will  use  expression  signatures  and  biomarkers  derived 
from  the  results  of  aims  1  and  2  to  develop  and  test  airway-based  biomarkers  capable  of 
diagnosing  lung  cancer  in  current  or  former  smokers  using  minimally  invasive  sites. 

This  report  details  the  progress  made  during  the  second  year  of  research. 
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Molecular  Profiles  for  Lung  Cancer  Pathogenesis  and  Detection  in  U.S.  Veterans 

Specific  Aim  1 :  To  increase  our  understanding  of  the  molecular  basis  of  the 

pathogenesis  of  lung  cancer  in  the  “field  cancerization”  that 
develops  in  current  and  former  smokers. 

Summary  of  Research  Findings 

A.  Collection  of  airway  epithelial  samples  from  both  bronchoscopy  and  lobectomy 
specimens  from  smokers  with  and  without  lung  cancer  (Sub-specific  Aims  1A  and  1C): 

We  have  recruited  35  study  participants  undergoing  resection  of  lung  tumor  or  benign  lung 
lesions  to  collect  tissue  samples  for  the  studies  in  Aim  1 .  From  these  subjects  who  were 
recruited  at  all  4  participating  institutions,  we  have  collected  nasal  epithelium,  proximal  and 
distal  bronchial  airway  epithelium  obtained  at  bronchoscopy  (ipsilateral  and  contralateral  to  the 
tumor)  as  well  as  the  tumor/benign  lesion,  adjacent  normal  parenchyma,  and  subsegmental 
bronchial  epithelium  at  time  of  lobectomy.  A  summary  of  subjects  recruited  at  all  4  sites  is 
provided  in  Table  1  and  their  demographics  are  shown  in  Table  2. 

The  samples  are  currently  being  analyzed  by  both  next  generation  RNA-sequencing  (RNA-Seq) 
using  the  lllumina  HiSeq  2000  platform  and  microarray  profiling  using  the  Human  Gene  2.0  ST 
platform  from  Affymetrix.  RNA-Seq  and  microarray  analysis  are  being  performed  at  BU  and  MD 
Anderson  Cancer  Center,  respectively.  Total  RNA  from  all  samples  have  been  isolated  using 
the  miRNeasy  kit  from  Qiagen.  RNA  sequencing  will  facilitate  the  discovery  of  novel  transcripts 
in  the  molecular  field  of  injury  as  well  quantifying  expression  of  those  that  cannot  be 
characterized  by  microarray  technology.  This  study,  for  the  first  time,  will  allow  us  to  1)  perform 
next  generation  sequencing  in  addition  to  microarray  profiling  analysis  of  the  molecular  field  of 
injury  in  the  airway;  2)  study  samples  obtained  from  four  different  institutions  in  the  nation  using 
common  SOPs  and  3)  characterize  the  complete  topological  map  of  the  molecular  field  of 
injury/cancerization  between  both  NSCLC  patients  and  cancer-free  individuals.  We  anticipate 
that  RNA-Seq  and  microarray  profiling  will  be  completed  by  the  end  of  the  year  with  subsequent 
bioinformatic  and  functional  analysis  along  with  validation  of  expression  studies  completed  by 
Spring  2013.  We  anticipate  that  expression  profiles  in  the  NSCLC  molecular  field  of  injury  will 
harbor  transcripts,  both  novel  and  established,  that  may  exhibit  potential  for  use  as  airway 
biomarkers  that  can  be  developed  and  tested  for  lung  cancer  detection  using  minimally  invasive 
sites  in  Specific  Aim  3  of  this  award. 
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Table  1.  Molecular  mapping  of  the  field  of  injury  in  NSCLC  and  cancer-free  patients 


RNA-Seq  (cases) 

Microarray  (cases) 

Institution 

ADC 

SCC 

No  Cancer 

ADC 

SCC 

No 

Cancer 

MD  Anderson 

4 

2 

0 

4 

3 

0 

BU 

2 

1 

3 

0 

2 

4 

UCLA 

2 

2 

1 

3 

2 

2 

Vanderbilt 

1 

1 

1 

4 

3 

1 

Number 

9 

6 

5 

11 

10 

7 

Total  Nb.  of  cases 
analyzed 

20 

28 

Total  Nb.  of  samples 
analyzed 

156 

183 

RNA-Seq,  RNA  sequencing;  ADC,  adenocarcinoma;  SCC,  squamous  cell  carcinoma;  BU, 
Boston  University;  UCLA,  University  of  California  Los  Angeles. 


Table  2.  Demographics  of  study  participants 


Ethnicity 

Male 

Female 

White 

14 

5 

Black 

7 

3 

Hispanic 

1 

0 

Asian 

2 

1 

American  Indian 

0 

0 

Other 

1 

0 

Unknown 

0 

0 

Tissue  collection: 

The  collection  protocol  SOP  put  in  place.  The  Table  3  shows  the  samples  collected  at 
Vanderbilt  University  including  tumor,  brushings  from  large  and  small  airways  and  areas  of 
normal  lung.  With  the  collaboration  between  Drs  Massion  (PI)  and  Dr.  Eisenberg  (Pathologist), 
the  process  is  in  place  and  we  anticipate  enrolling  at  least  20  patients  per  year  to  this  protocol. 
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Table  3.  10  patients  collected  and  samples  shipped  to  BU  for  RNA  extraction  and  sequencing  analysis. 


Patient  ID 

LCBID 

Sample  Type 

Fixative 

Sample  ID 

Storage 

Temperature  in 
Centigrade 

Location  of  Brushes 

1 

8841 

2012-4-1-800-1 

Normal  Tissue 

RNA  Later 

534048 

-80 

Tumor  is  central  -  LUL 

Normal  Tissue 

534049 

-80 

B1  closest  to  tumor 

Tumor  Tissue 

RNA  Later 

534050 

-80 

B2  same  bronchus  as  B1  -  peripheral 

Tumor  Tissue 

534051 

-80 

B3  and  B4  -  different  airway  - 

Brushes  B1-B4 

Qiazol 

534052-55 

-80 

2 

8836 

2012-4-1-802-1 

Normal  Tissue 

RNA  Later 

534293 

4 

Tumor  is  peripheral  (s/p  chemorad)  -  RUL 

Normal  Tissue 

534294 

-80 

B3  closest  to  tumor 

Tumor  Tissue 

RNA  Later 

534295 

4 

B2  same  airway  as  B3  -  more  proximal 

Tumor  Tissue 

534296 

-80 

B1  different  airway  -  proximal 

Brushes  B1-B3 

Qiazol 

534299-301 

-80 

scant  tumor  left  on  specimen  after  chemorad 

3 

8836 

2012-4-1-803-1 

Normal  Tissue 

RNA  Later 

534326 

4 

Tumor  is  central  -  LUL 

Normal  Tissue 

534327 

-80 

B1  closest  to  tumor 

Tumor  Tissue 

RNA  Later 

534329 

4 

B2  same  airway  as  B1  -  distal 

Tumor  Tissue 

534328 

-80 

B3  different  airway 

Brushes  B1-B3 

Qiazol 

534330-32 

-80 

4 

9002 

2012-5-1-811-1 

Normal  Tissue 

RNA  Later 

535568 

4 

Tumor  is  peripheral  -  RLL 

Normal  Tissue 

535569 

-80 

B4  closest  to  tumor 

Tumor  Tissue 

RNA  Later 

535570 

4 

B1  and  B2  same  airway  as  B4  -  more  proximal 

Tumor  Tissue 

535571 

-80 

B3  different  airway  -  proximal 

Brushes  B1-B4 

Qiazol 

535572-75 

-80 

5 

9006 

2012-5-1-814-1 

Normal  Tissue 

RNA  Later 

535692 

4 

Tumor  is  central  -  LUL 

Normal  Tissue 

535690 

-80 

B1  closest  to  tumor 

Tumor  Tissue 

RNA  Later 

535693 

4 

B2  and  B4  same  airway  as  B1  -  distal 

Tumor  Tissue 

535691 

-80 

B3  different  airway 

Brushes  B1-B4 

Qiazol 

535694-97 

-80 

6 

9047 

2012-6-1-821-1 

Normal  Tissue 

RNA  Later 

536361 

4 

Tumor  is  Central  -  LUL 

Normal  Tissue 

536360 

-80 

B1  is  closest  to  tumor 

Tumor  Tissue 

RNA  Later 

536363 

4 

B2  on  a  different  airway  than  B1  distal. 

Tumor  Tissue 

536362 

-80 

B3  on  an  opposite  airway  distal. 

Brushes  B1-B3 

Qiazol 

536364-66 

-80 

7 

9078 

2012-6-1-824-1 

Normal  Tissue 

RNA  Later 

536782 

4 

Tumor  is  peripheral  -  LLL 

Normal  Tissue 

536781 

-80 

B1  is  distal  to  tumor 

Tumor  Tissue 

RNA  Later 

536784 

4 

B2  is  closest  to  tumor  on  same  airway  as  Bl. 

Tumor  Tissue 

536783 

-80 

B3  is  on  a  different  airway  distal. 

Brushes  B1-B3 

Qiazol 

536785-87 

-80 

8 

9138 

2012-7-1-828-1 

Normal  Tissue 

RNA  Later 

537331 

4 

Tumor  is  peripheral  -  LUL 

Normal  Tissue 

537330 

-80 

Bl  is  closest  to  tumor. 

Tumor  Tissue 

RNA  Later 

537329 

4 

B2  on  same  airway  as  Bl  distal. 

Tumor  Tissue 

537328 

-80 

B3  is  on  a  different  airway  distal. 

Brushes  B1-B3 

Qiazol 

537332-34 

-80 

9 

9258 

2012-8-1-840-1 

Normal  Tissue 

RNA  Later 

538648 

4 

Tumor  is  perripheral  -  LLL 

Normal  Tissue 

538649 

-80 

B2  is  closest  to  tumor  on  the  same  airway  as  Bl. 

Tumor  Tissue 

RNA  Later 

538646 

4 

B3  and  B4  are  distal  to  tumor  on  different  airway. 

Tumor  Tissue 

538647 

-80 

Brushes  B1-B4 

Qiazol 

538650-653 

-80 

10 

9401 

2012-9-1-848-1 

Normal  Tissue 

RNA  Later 

539677 

4 

Tumor  is  peripheral  -  RUL 

Normal  Tissue 

539678 

-80 

Bl  is  proximal  to  tumor  on  same  airway  as  B2. 

Tumor  Tissue 

RNA  Later 

539675 

4 

B2  is  distal  to  tumor. 

Tumor  Tissue 

539676 

-80 

B3  is  distal  to  tumor  on  different  airway. 

Brushes  B1-B3 

Qiazol 

539679-681 

-80 
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Specific  Aim  2:  To  increase  our  understanding  of  the  role  of  tumor-initiating 

stem/progenitor  cells  in  the  pathogenesis  of  lung  cancer  in  the  “field  cancerization”  that 
develops  in  current  and  former  smokers. 

Summary  of  Research  Findings: 

A.  Feasibility  of  sequencing  small  amounts  of  RNA  from  laser  captured  samples  that 
reflect  different  pathologic  stages  of  lung  carcinogenesis  (Sub-specific  Aim  2B): 

Specific  regions  of  normal  basal  cells,  premalignant  metaplastic/dysplastic  cells,  and  squamous 
carcinoma  cells  were  successfully  selected 
by  laser  microdissection.  This  was 
described  and  detailed  in  the  previous 
annual  report  (Year  1).  Adequate  amounts 
of  RNA  were  isolated  from  these  cells  for 
library  preparation  and  high  throughput 
sequencing  (RNA-seq),  and  decent 
quantities  of  libraries  with  appropriate  size 
ranges  were  generated.  This  was  described 
and  detailed  in  the  previous  annual  report 
(Year  1).  Samples  were  sequenced  on 
lllumina  Genome  Analyzer  llx  or  HiSeq 
2000  instruments,  producing  single-end 
reads  with  quality  control  Phred  scores 
above  30. 


Sequence  alignment  and  quantification  of  gene  expression 

The  reads  produced  by  sequencing  each  RNA  sample  were  aligned  to  the  human  genome 
(build  hg19).  Uniquely  aligned  reads  were  then  used  to  compute  gene  expression  estimates  by 
measuring  the  coverage  of  each  of  -35,000  Ensembl  Gene  loci  using  the  coverageBed  utility  in 
the  BEDTools  suite.  This  workflow  is  illustrated  in  Figure  1. 

Identification  of  genes  with  progression-associated  expression  patterns 
To  identify  genes  whose  expression  was  associated  with  progression  from  normal  to 
intermediate  (metaplastic  or  dysplastic)  to  tumor  cells,  a  three-step  procedure  was  used  in  Dr. 
Spira’s  lab.  First,  genes  with  low  expression  (median  expression  was  below  100  reads),  which 
are  more  likely  to  be  false  positives,  were  removed  from  analysis.  Next,  to  simplify  the  analysis, 
a  concordance  filter  was  then  applied  in  order  to  consider  only  those  genes  whose  expression 
changed  in  the  same  direction  in  both  intermediate  and  tumor  cells  relative  to  normal  cells. 

B.  Identification  of  additional  archived  clinical  specimens  for  laser  microdissection  of 
tumor-initiating  stem/progenitor  cells: 

Additional  archived  clinical  specimens  from  which  we  previously  extracted  DNA  and  RNA  from 
rare  target  cells  using  laser  microdissection  were  identified.  Within  the  same  individual,  the 
following  regions/cells  were  present:  normal  epithelium,  normal  basal  cells,  dysplastic  cells  in 
preinvasive  tissue,  as  well  as  their  respective  basal  cells,  and  invasive  tumor  cells.  The  cases 
identified  included  biopsy  specimens  and  resected  tumors,  and  each  was  confirmed  to  be 
amenable  for  microdissection  after  review  of  freshly  cut  H&E  stained  slides  by  our  collaborating 


n=2  patient? 

n=2  patients 

RNA  sequencing  platform 

Genome  Analyzer  llx 
36-base,  single  end 

HiSeq  2000 
50-base,  single-end 

Bowtie 

¥ 

TopHat 

Alignment  rrnthcd 

Bowtie 

TopH!at 

i 

r 

Uniquely  aligned  reads/ss mpie 

"10-15  million  reads 

"50-60  million  reads 

r 

coverageBed  (BEDTools  suite) 

Gtrta-Itvtl  expression 

_ I _ 

f 

Expression  of  '35,000 

Ensembl  Gene  (ENSG)  IDs 

Figure  1.  RNA  sequencing  workflow.  Single-end  reads  were 
aligned  to  the  human  genome  and  gene  expression  was 
quantified  by  measuring  the  overlap  with  Ensembl  Gene  IDs. 
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pathologists.  Using  archived  clinical  specimens  available  in  biorepositories  at  each  site,  we 
have  sufficient  material  to  complete  the  studies  described  in  sub-specific  Aim  2B. 

In  collaboration  with  Dr.  Gomperts,  we  provided  as  series  of  preinvasive  lesions.  Our  knowledge 
about  the  field  of  cancerization  is  limited  and  understanding  the  molecular  determinants  of 
tumor  development  is  critical  to  the  area  of  research.  Our  collaboration  with  Dr.  Gomperts 
laboratory  is  to  compare  molecular  profiles  of  tumor-initiating  stem/progenitor  cells  from  normal 
airway  epithelium,  preinvasive  lesions  and  invasive  lung  tumor  tissues  and  evaluate  the  role  of 
airway  epithelium  tumor-initiating  stem/progenitor  cells  in  lung  cancer  pathogenesis  in  current 
and  former  smokers.  The  hypothesis  is  that  the  airways  of  lung  cancer  patients  have  greater 
population  of  cells  with  stem/progenitor-like  characteristics,  population  of  cells  that  we  could  find 
in  selected  individuals  at  high  risk  of  developing  lung  cancer.  Selection  and  molecular 
characterization  of  this  subpopulation  may  lead  to  the  identification  of  candidate  biomarkers  that 
are  important  for  understanding  early  events  of  lung  cancer  pathogenesis.  This  may  bring 
relevance  to  identifying  persons  at  highest  risk  of  developing  lung  cancer  and  potentially 
developing  this  knowledge  into  new  therapeutic  targets. 

We  specifically  provided  from  our  archived  materials,  tissue  specimens  we  selected  from  which 
we  could  extract,  using  laser  capture  micro-dissection,  DNA  and  RNA  from  normal  epithelium, 
normal  basal  cells,  dysplastic  cells  in  preinvasive  tissue  as  well  as  their  respective  basal  cells, 
and  compare  those  to  invasive  tumor  cells  from  the  same  individuals.  From  the  inventory  of  the 
VUMC  biorepository  we  were  able  to  find  biopsy  specimens  and  resected  tumor  tissues  that 
were  amenable  for  micro-dissection.  Dr.  Eisenberg,  pathologist  in  our  group  at  VUMC  and  the 
Gomperts  laboratory  has  reviewed  each  H&E  stained  slide.  The  specifics  of  the  specimens  sent 
to  the  Gomperts  laboratory  included  slides  and  tissue  blocks  of  three  biopsies  and  one  resected 
tumor  as  described  in  Table  4. 


Table  4.  Samples  sharing  with  Dr.  Gomperts. 


Date  of 
shipment 

ID 

Description 

Unstained  slides 

H&E 

slides 

Tissue 

blocks 

6/15/2010 

2006-9-1-19-16 

Biopsy ,  Main  stem 

16 

1 

2007-3-1-435-9 

Biopsy,  Left  main  stem  lesion 

16 

1 

2007-4-5-466-9 

Biopsy,  Lesion 

16 

1 

2005-8-1-240-1 

Resected  tumor 

16 

1 

1/25/2011 

2006-9-1-19-16 

Biopsy,  Main  stem 

1 

2007-3-1-435-9 

Biopsy,  Left  main  stem  lesion 

1 

2007-4-5-466-9 

Biopsy,  Lesion 

1 

2005-8-1-240-1 

Resected  tumor 

1 

5/10/2012 

S05-23133, 9A 

Resected  tumor 

2 

1 

SOS -23133,  9  E 

Resected  tumor 

2 

1 

S05-2SGQ5,  7A 

Resected  tumor 

2 

1 

505-2SGQ5,  7B 

Resected  tumor 

2 

1 

SQ9-11290,  2C 

Resected  tumor 

2 

1 

S09-11290,  SA 

Resected  tumor 

2 

1 

SG9-11290,  SJ 

Resected  tumor 

2 

1 

S09-11290,  SK 

Resected  tumor 

2 

1 

6/25/2012 

SOS-23133,  9A 

Resected  tumor 

10  (2  sections/slide) 

SOS-23133,  9E 

Resected  tumor 

20  (1  sections/slide) 
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C.  Proteomic  Studies 

As  previously  described,  the  UCLA  group  identified  a  molecular  profile  dominated  by  the  Snail 
transcription  factor  that  appears  to  drive  epithelial  mesenchymal  transition  (EMT)  and  tumor- 
initiating  characteristics  in  the  airway  epithelium,  as  modeled  in  vitro  and  in  vivo.  Snail  is  also 
over-expressed  in  human  bronchial  epithelial  cells  in  premalignant  lesions  in  situ  concomitant 
with  markers  of  EMT  and  sternness.  To  validate  the  technology  for  analyzing  tumor-initiating 
stem  cells  from  in  situ  specimens,  we  performed  preliminary  in  vitro  experiments  to  assess  the 
impact  of  this  transcription  factor  on  protein  expression  profiles  of  human  bronchial  epithelial 
cells.  In  this  context,  we  performed  Shotgun  Proteomic  Analysis  comparing  human  bronchial 
epithelial  cells  and  the  same  cells  ectopically  over-expressing  Snail. 

Cell  pellets  were  collected  at  UCLA  and  prepared  for  LC-MS/MS  shotgun  proteomics  and 
analyzed  in  the  Jim  Ayers  Institute  at  Vanderbilt,  as  described  in  our  year  1  progress  report. 
Briefly,  each  0.2  mg  protein  aliquot  was  digested  and  resolved  by  isoelectric  focusing  into  15 
fractions  that  were  subsequently  analyzed  by  LC-MS/MS.  Thus,  there  were  6  measurements  (2 
technical  replicates  for  3  samples)  for  the  control  group  and  6  for  the  Snail+  group.  Raw  MS/MS 
data  were  evaluated  using  MyriMatch  and  IDPicker  software.  Differentially  expressed  proteins 
were  then  identified  using  Quasi-Tel  pairwise  comparison. 

The  initial  dataset  was  robust,  with  2809  protein  groups  identified  overall;  a  protein  group 
usually  represents  a  single  protein,  but  it  is  occasionally  a  small  group  of  indistinguishable 
proteins  with  identical  peptides.  The  overall  numbers  of  protein  groups  in  the  control  and  Snail+ 
bronchial  epithelial  cells  were  similar  (2229  and  2738,  respectively).  The  following  general 
observations  were  made:  (1)  Known  markers  of  EMT  were  over-expressed  in  the  Snail+  cells. 
(2)  Other  structural/motility  proteins  consistent  with  an  EMT  phenotype  were  also  over¬ 
expressed  in  the  Snail+  cells. 

To  augment  our  ability  to  identify  proteins  relevant  to  the  molecular  pathogenesis  of  lung  cancer 
across  the  broadest  possible  patient  population,  we  will  perform  shotgun  proteomics  on 
additional  samples.  A  panel  of  bronchial  epithelial  cells  isolated  from  patients  and  engineered  to 
over-express  Snail  have  been  plated  in  western  blot  and  anchorage  independent  growth  (AIG) 
assays.  Via  these  assays,  we  will  re-confirm  their  Snail  expression  and  Snail-driven  malignant 
conversion  prior  to  their  proteomic  evaluation.  Each  of  the  cell  types  in  this  panel  has  previously 
demonstrated  numerous  Snail-driven  cancer-associated  phenotypes,  including  EMT,  sternness, 
AIG,  and/or  tumor  growth  and  metastatic  behavior  in  mice.  At  the  conclusion  of  these  assays, 
cells  maintained  in  culture  in  parallel  will  be  collected,  prepared  for  shotgun  proteomics,  and 
analyzed,  as  previously  described.  Additionally,  tumor-initiating  Snail+ALDH+CD44+CD24- 
bronchial  epithelial  cells  will  be  subjected  to  shotgun  proteomics  in  the  same  manner.  This 
relatively  rare  cell  type  will  be  isolated  by  fluorescence  activated  cell  sorting  (FACS),  and  the 
resulting  cell  pellets  will  be  frozen.  Multiple  pellets  will  be  pooled  to  generate  material  sufficient 
for  evaluation;  this  again  models  laser  capture  microdissection  (LCM)  isolation  of 
stem/progenitor  target  cells  from  normal/SM/SCC  and  normal/AAH/ADC  regions  of  archived 
clinical  specimens. 

By  evaluating  additional  samples,  including  rare  tumor-initiating  stem  cells,  we  anticipate 
arriving  at  a  more  robust  protein  signature  relevant  to  lung  carcinogenesis.  Models  and  software 
developed  in  the  Jim  Ayers  Institute  at  Vanderbilt  are  more  appropriately  applied  to  studies  with 
these  multiple  inputs.  The  new  protein  signature  that  emerges  will  be  further  strengthened  via 
Multiple  Reaction  Monitoring  (MRM)  performed  on  the  remaining  samples  by  the  Vanderbilt 
group.  MRM  using  mass  spectrometry  is  a  highly  sensitive  and  selective  method  for  the 
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targeted  quantitation  of  protein  or  peptide  abundances  in  complex  samples.  While  shotgun 
proteomics  detects  all  protein  changes  in  the  sample  in  an  unfocused  fashion,  MRM  is  targeted 
and  highly  selective,  allowing  us  to  specifically  look  for  proteins  of  interest. 

To  this  end,  we  have  generated  a  list  of  candidate  proteins  for  MRM  utilizing  shotgun  proteomic, 
mRNA  array,  and  miRNA  array  datasets  generated  from  the  same  Snail+  cells.  Candidates  with 
the  greatest  fold  change  and  level  of  significance  were  included.  Candidates  at  the  intersection 
of  each  of  these  lists,  as  evaluated  by  Ingenuity  Pathway  Analysis  (IPA),  were  also  included. 
Finally,  additional  candidates  of  interest  were  included  based  on  our  hypothesis-driven  studies 
of  lung  carcinogenesis,  including  mediators  of  inflammation,  EMT,  sternness,  metabolism, 
apoptosis-resistance,  as  evaluated  in  the  Pis’  lab-based  studies  over  the  past  several  years. 
While  we  have  already  generated  this  candidate  list  for  MRM,  the  list  will  be  further  refined  as 
we  expand  our  shotgun  proteomic  analysis  to  include  additional  samples. 

Finally,  during  the  preceding  funding  period,  we  developed  a  Microsoft  Access  database  with 
the  intent  of  including  an  additional  parameter,  “druggability”,  in  the  selection  of  top  candidates 
for  further  validation  and  detailed  functional  studies.  The  first  iteration  of  the  database  was 
created  by  integrating  the  protein,  mRNA,  and  miRNA  datasets  previously  described  with  a  lung 
cancer-specific  terms  list  (see  Figures).  These  were  then  linked  to  information  regarding 
proteins/genes/miRNAs  for  which  agents  are  in  use  or  in  the  pipeline  along  with  additional 
clinical  utility  parameters,  such  as  how  successful  the  agent  is  and  its  range  of  use.  This 
database  will  be  refined  as  we  expand  our  shotgun  proteomic  analysis  to  additional  samples 
and  as  we  receive  inputs  from  the  sequencing  and  array  studies  in  the  other  aims.  This 
database  will  serve  as  an  important  new  tool  for  selecting  the  best  protein  candidates  to  include 
in  our  upcoming  MRM  studies. 


List  of  Proteins 

Shotgun  Proteomics  /  All  with  Significant  P-  Values 


5  AARS2 

6  AC 092037.2 

7  ACTN4 

8  A13S3997.5 

9  ANXA7 

10  AIP‘,6 
ll.BCAM 

12  C19orf21 

13  CALU 

14  (ALU 

15  CKAP4 
16jClTC 

17  DYNC1H1 

18  i  cm 

19  EIF4G1 

20  LPRS 

a  FAM120A 

22  FASN 

23  FDFT1 

24  FKBP10 

25  FN1 


ENSGOOOOO 124608  AARS2  alanyl-tRNA  synthetase  2,  mitochondrial  (putative) 
ENSG00000249398  AC092037.2  Mesencephalic  astrocyte-derived  neurotrophic  factor  I 
ENSGOOOOO 130402  ACTN4  actinin.  alpha  4 

ENSGOOOOO  1868 31  AL353997.5  Putative  uncharacterljed  protein  XNSP00000319235 
ENSG00000182718  ANXA2  annexm  A2 

ENSG000001109SS  ATP5B  ATP  synthase.  H*  transporting,  mitochondrial  FI  complex,  b 

ENSG 00000187244  BCAM  basal  cell  adhesion  molecule  (Lutheran  blood  group) 

ENSG00000099812  C19orf21  Uncharacterized  protein  C19orf21 

ENSGOOOOO  128595  CALU  calumenin 

ENSGOOOOO 128595  CALU  calumenin 

ENSG00000136026  CKAP4  cytoskeleton-associated  protein  4 

ENSG00000141367  C‘  -  .  - 


A 

B 

C 

0 

1 

2: 

list  of  Genes 

R  Analysis/ >  10-Fold  Change 

j 

Svmbo, 

Gene  name  ( 

inear  scale  I2*nl 

i  5 

203504_S_at 

ABCA1 

ATP- binding  cassette,  sub- family  A  (ABC 

10.96 

i6 

I553605_a_at 

ABCA13 

ATP-bmdmg  cassette,  sub-family  A  (ABC 

•25.38 

7 

217504_at 

ABCA6 

ATP-bindmg  cassette,  sub-family  A  (ABC 

11.86 

8 

235335_at 

ABCA9 

ATP-binding  cassette,  sub-family  A  (ABC 

-12.18 

9 

I552590_a_at 

ABCC12 

ATP-bindmg  cassette,  sub-family  C  (CFTI 

30.68 

10 

1552582  at 

ABCC13 

ATP-binding  cassette,  sub-family  C  (CFTI 

•16.08 

11 

243928_S_at 

ABCC4 

ATP-bindmg  cassette,  sub-family  C  (CFTI 

14.81 

12 

208S61_at 

A8CC9 

ATP-binding  cassette,  sub-family  C  (CFTI 

14.69 

n 

7075*1  at 

ARCO? 

ATP-bindmg  cassette,  sub-family  D  ( AID 

16.69 

ENSG00000197102  0 
ENSG00000104823E 
ENSG000001 14867  E 
ENSG00000136628E  ~  p  Vj|u> 

ENSG00000048828F.  5  '  0.2453 

ENSG00000169710F  #  hu.lpt  7b/mmu.let., b/rno,'  0.2453 
ENSG 000000 79459  F 
ENSG000001417S6  F 
ENSG000001 15414  F  ,  hM.,c,.7e/mmu.,el  7e/ino/  0.6985 


List  of  Genes 

DAE  Analysis  /  >1. 13-Fold  Change 


7  hsa-let-7c/mmu-let-7c/rno-l< 

8  hsa-let-7d/mmu-let-7d/mo- 


ENSG00000135486  H  „  hwlol7</mmu.|et7f/fno.l( 
ENSG00000166S98Hu  b„.,et.7(!/mmlI.let.7g 
I  I  HI  Proteins .  wWNA  .  Genes  .  *.  12  hsa-let-7i/mmu-let-7i/rno-l€ 


SUM  duimation  and  picsi  OFTIR  si  ctiMic  Pa:  13  hsa  miR-100/mmu  miR-100/ 

14  hsa  miR103/mmu  miR103/ 

15  lisa 

16  hsa-miR-106b/mmu-miR-106 

17  hsa  miR  10? 

18  hsa  miR  lOa/mmu  miR-lOa/i 

19  hsa  miR  l0b/mmu  mlR  lob/r 

20  hsa  miR-1201 

21  hsa  miR-1236 

22  hsa-miR  1246 

23  hsamlR-1249 

24  hsa  miR-12SSa 

25  hsa  miR  1259 

[  26  hsa-miR-12Sa  Sp/mmu  miR- 
27 1  hsa  miR  12Sb/mmu  miR  12S 


0.2207 

0.6985 

0.6985 

0.2453 

0.6985 

0.2453 

0.4142 

1.0000 

1.0000 

0.2453 

0.4142 

1.0000 

0.6985 

0.6985 

1.0000 


3530.25 

2306.13 
15089 JO 

1272.13 

874.63 

2086.25 

874.75 

1259.63 

789.25 

505.25 
257.69 
277 JO 

425.38 

229.13 

5335.38 

294.13 

766.38 

2865.75 
2029.50 
4916.00 


6603.25 

4537.50 

7109.88 

3298.50 
2157.00 
16558.00 

1347.88 

799.75 
1371.63 

1037.38 

1334.50 

732.75 

576.88 
258.00 

284.88 
447.00 

279.13 

5179.50 

315.13 

669.50 

3190.38 

1523.38 

3253.38 


Fold  V-S  Fold  5-V 
1.1222  0.8911 

1.2996  0.7695 

1.0527  0.9500 

1.0703  0.9344 

1.0691  0.9353 

0.9113  1.0973 

0.9438  1.0595 

1.0936  0.9144 

1JH0  0.6575 

0.8432  1.18S9 

0.9439  1.0S94 

1.0771  0.9284 

0.8758  1.1418 

0.9988  1.0012 

0.9741  1.0266 

0.9516  1.0508 

0.8209  1.2182 

1.0301  0.9708 

0.9334  1.0714 

1.1447  0.8736 


ibhydrolase  domain  containing  1 
blase  domain  containing  12B 
?  domain  containing  7 
ibhydrolase  domain  containing  9 
ibl  interactor  2 

\Bi  family,  member  3  (NESH)  binding  pr. 
imilonde  binding  protein  1  (amine  oxid 
ingiotensm  l  converting  enzyme  (peptic 
icyl-CoA  thioesterase  11 
acid  phosphatase,  prostate 
•cyl-CoA  synthetase  long-chain  family  it 
icyl-CoA  synthetase  long-chain  family  rr 
icyl-CoA  synthetase  medium-chain  fami 
icyl-CoA  synthetase  short-chain  family  r 


30.90 

15.32 

-107.13 

-14.69 

-14.00 

17.30 

18.50 

10.57 

-12.06 

24.35 
19.45 
-10.67 
11.09 

11.35 


imRItA  Genu  *J 


Snail  datasets  used  to  build  ‘druggable’  candidate  selection  Microsoft 
Access  database.  Datasets  include  mRNA  array,  miRNA  array,  and 
shotgun  proteomics  results. 
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Luna  Cancer  Terms: 

Non-small  cell  lung  cancer  (NSCLC) 

Atypical  adenomatous  hyperplasia  (AAH) 
Squamous  cell  carcinoma 
Squamous  metaplasia  (SM) 

Premalignancy 

Chronic  obstructive  pulmonary  disease  (COPD) 
Idiopathic  pulmonary  fibrosis  (IPF) 

FEV1 

FEV1/FVC  ratio 
Tiffeneau  index 
Asthma 
Emphysema 
Smoking 

Chronic  inflammation 
Airway  destruction 

Epithehal-to-mesenchymal  transition  (EMT) 
Plasticity 

Tumor  initiating  stem  cells 
Progenitor  cells 
Stem  cell  niche 
Microenvironment 
Immune  suppression 
Immune  escape 
Immune  surveillance 
Autophagy 

Cancer-associated  fibroblasts 

Apoptosis 

Angiogenesis 

Senescence 

Contact  inhibition 

Immortality 


Telomerase 

Anchorage  independent  growth 

Malignant  conversion 

Motility 

Migration 

Invasion 

Metastasis 

MicroRNA 

Non-coding  RNA 

Small  Non-coding  RNA 

Transcription  factor 

Zinc-finger  transcription  factor 

Transcriptional  repressor 

Chemoprevention 

Inflammatory  mediators 

Phosphorylation 

DNA  binding 

Tumor  suppressor 

Somatic  mutation 

Loss-of-function 

Alternative  splicing 

RNA  interference 

Protein-protein  interactions 

Tyrosine  kinase  receptor 

G-protein  coupled  receptor 

Seven-transmembrane  receptor 

Glycolysis 

Oxidative  stress 

Reactive  oxygen  species 

Drug  resistance 

Exosomes 

Circulating  microRNA 
Parallel  progression  model 


Lung  cancer  terms  list  used  to  build 
‘druggable’  candidate  selection  database. 


To  Select  Genes  in  The  "Shotgun 

Proteomics"  data  -  right  click  on  the  dataset 

•  Then  click  on  is  "Selected" 

•  The  shotgun  proteomics  box  is  a 
checkbox.  It  is  a  yes  no  selection. 

•  This  maneuver  will  select  all  genes  in  the 
original  shotgun  proteomics  Excel 
spreadsheet. 


Before  selecting  there  were  42073 
genes  to  scroll  through. 

It  is  now  possible  to  use  the  arrows 
to  scroll  through  just  the  the  69 
genes  in  the  proteomics  dataset. 


Creation  of  Microsoft  Access  database.  Selection  of  datasets 
to  overlay  and  subsequent  narrowing  of  candidate  list. 


Entrez  Info  -  Gray 
Entrez  Summary  -  Blue 
Comments -Orange 
Druggability-  Red 
Data  -  Pink 

Literature  Links  -  White 


The  data  submitted  by 
the  Dubinett  Lab: 
mRNA  candidates, 
miRNA  candidates; 
protein  candidates. 


All  fields  locked  except 
Comments  Section.  User  can 
use  this  section  for  notes  and  to 
put  in  links  to  interesting 
abstracts  or  web  pages. 


Creation  of  Microsoft  Access  database.  Dataset  integration 
and  candidate  selection  based  on  “druggability”. 
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Specific  Aim  3:  Test  airway-based  mRNA  and  microRNA  biomarkers  of  diagnosing  lung 
cancer  in  current  and  former  smokers  at  high  risk  for  lung  cancer  in  minimally  invasive 
sites. 


The  studies  on  this  Aim  will  be  carried  out  in  Years  3  and  4  of  the  grant. 

In  collaboration  with  Dr.  Liebler,  we  have  developed  a  series  of  MRM  assays  that  we  can  test  in 
the  airways  of  individuals  for  lung  cancer.  The  methods  developments  have  recently  been 
published  in  MCP  9,1°.  This  validation  effort  proposed  in  aim  3  has  not  formally  started  because 
we  have  not  settled  on  candidates  to  be  tested  by  MRM. 

KEY  RESEARCH  ACCOMPLISHMENTS 

1 .  Collected  matched  epithelial  cells  from  the  nose,  proximal  and  distal  airways  and 
tumor/adjacent  normal  lung  from  35  patients  undergoing  surgical  resection  of  lung  lesions  at  all 
four  participating  institutions. 

2.  Isolated  RNA  from  all  airway  samples  collected  and  begun  to  profile  RNA  using  RNA-Seq  (20 
cases,  156  samples)  and  microarray  (28  cases,  183  samples)  platforms  in  order  to  characterize 
the  spatial  map  of  the  molecular  field  of  injury  in  NSCLC  patients  and  cancer-free  individuals. 

3.  Performed  RNA  sequencing  on  cell  populations  in  matched  sets  of  histologically  normal 
airway,  premalignant  lesions  and  tumors  from  the  same  individuals,  and  identified  candidate 
genes  that  increase  in  expression  in  premalignancy  and  in  tumors.  We  identified  candidates 
that  we  are  validating  by  PCR  and  will  validate  by  MRM  analysis. 

4.  Identified  and  shared  a  set  of  preinvasive  lesions  matching  their  pair  normal  and  invasive 
tumors  to  compare  molecular  profiles  of  tumor-initiating  stem/progenitor  cells  from  these  groups 
and  evaluate  the  role  of  airway  epithelium  tumor-initiating  stem/progenitor  cells  in  lung  cancer 
pathogenesis  in  current  and  former  smokers. 

5.  Established  MRM  assays  for  candidate  biomarkers  of  early  lung  cancer. 


REPORTABLE  OUTCOMES 


Abstracts: 

Ooi  AT,  Gower  AC,  Zhang  KX,  Vick  J,  Caballero  N,  Massion  PP,  Wistuba  II,  Walser  TC, 
Dubinett  SM,  Pellegrini  M,  Lenburg  ME,  Spira  A  and  Gomperts  BN.  Molecular  Profiles  to 
Improve  our  Understanding  of  Lung  Cancer  Pathogenesis  in  U.S.  Veterans.  NIH  Lung 
Cancer  SPORE  Meeting.  Pittsburgh.  July  2012. 


CONCLUSIONS 


During  our  second  year  of  research,  we  have  collected  epithelial  samples  throughout  the 
respiratory  from  smokers  with  and  without  lung  cancer  using  common  SOPs  across  all  4 
participating  institutions,  and  we  have  initiated  whole-genome  gene-expression  profiling  of  these 
samples  using  both  RNA-seq  and  microarrays.  We  also  used  a  unique  approach  to  profile  cell 
populations  from  the  normal  airway,  premalignant  lesions  and  tumors  and  were  able  to  validate 
these  genes.  We  have  established  proteomics  methods  required  to  validate  our  candidates  in 
bronchial  specimens  during  years  3  and  4  of  the  award.  Both  the  spatial  mapping  and  the 
premalignant  tissue  studies  are  expected  to  yield  airway  biomarkers  for  lung  cancer  to  be  tested 
in  future  aims  of  this  project 
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